Search CORE

Open Research Exeter

BranchClust: a phylogenetic algorithm for selecting gene families

Author: AG Murzin
AJ Enright
AP Vogler
C Winstanley
CM Zmasek
DF Feng
DL Fulton
E Hilario
EL Sonnhammer
EV Koonin
F Chevenet
G Perriere
H Ochman
HA Schmidt
J Felsenstein
J Peter Gogarten
J Raymond
JA Lake
JD Thompson
JP Gogarten
JP Gogarten
JP Gogarten
K Oshima
KP O'Brien
L Olendzenski
LB Koski
M Remm
Maria S Poptsova
MG Montague
N Saitou
O Zhaxybayeva
O Zhaxybayeva
O Zhaxybayeva
P Lapierre
RC Edgar
RL Charlebois
RL Tatusov
S Guindon
S Tsutsumi
S van Dongen
SF Altschul
SF Altschul
SR Eddy
T Dagan
TJ Harlow
U Dobrindt
WM Fitch
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved. RESULTS: Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at . CONCLUSION: BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees

Public Library of Science (PLOS)

Reassessment of the Lineage Fusion Hypothesis for the Origin of Double Membrane Bacteria

Author: Anna G. Green
B Boussau
Gregory P. Fournier
I Sharon
IC Sutcliffe
J Raymond
J Xiong
J. Peter Gogarten
JA Lake
JA Lake
JA Servin
Jonathan H. Badger
JP Gogarten
Kristen S. Swithers
L Margulis
N Igarashi
O Zhaxybayeva
Pascal Lapierre
RD Finn
RE Valas
RG Beiko
RS Gupta
S Gribaldo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

In 2009, James Lake introduced a new hypothesis in which reticulate phylogeny reconstruction is used to elucidate the origin of Gram-negative bacteria (Nature 460: 967–971). The presented data supported the Gram-negative bacteria originating from an ancient endosymbiosis between the Actinobacteria and Clostridia. His conclusion was based on a presence-absence analysis of protein families that divided all prokaryotes into five groups: Actinobacteria, Double Membrane bacteria (DM), Clostridia, Archaea and Bacilli. Of these five groups, the DM are by far the largest and most diverse group compared to the other groupings. While the fusion hypothesis for the origin of double membrane bacteria is enticing, we show that the signal supporting an ancient symbiosis is lost when the DM group is broken down into smaller subgroups. We conclude that the signal detected in James Lake's analysis in part results from a systematic artifact due to group size and diversity combined with low levels of horizontal gene transfer.Exobiology Program (U.S.) (Grant NNX08AQ10G)Assembling the Tree of Life (Program) (Grant DEB 0830024

CiteSeerX

DSpace@MIT

Evidence for acquisition of virulence effectors in pathogenic chytrids

Author: AJ Phillips
BJ Haas
C Weldon
EA O'Brien
EB Rosenblum
Guiling Sun
H Bannai
H Ochman
HA Schmidt
J Felsenstein
J Huang
J Tian
J Zhang
J-C Vié
JD Bendtsen
JE Longcore
Jinling Huang
JM Kiesecker
JP Gogarten
JP Gogarten
K Goka
Kyle Summers
L Berger
LF Skerratt
M Barinaga
M Suyama
MA Larkin
MC Fisher
O Emanuelsson
P Rice
P van West
PJ Keeling
RC Edgar
S Guindon
S Vermout
SR Eddy
TA Torto
Tiffany Kosch
TL Friesen
TM Keane
TY James
W Ziebuhr
WS Wong
X Gu
X Gu
X Gu
Z Yang
Z Yang
Z Yang
Zefeng Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background The decline in amphibian populations across the world is frequently linked to the infection of the chytrid fungus Batrachochytrium dendrobatidis (Bd). This is particularly perplexing because Bd was only recently discovered in 1999 and no chytrid fungus had previously been identified as a vertebrate pathogen. Results In this study, we show that two large families of known virulence effector genes, crinkler (CRN) proteins and serine peptidases, were acquired by Bd from oomycete pathogens and bacteria, respectively. These two families have been duplicated after their acquisition by Bd. Additional selection analyses indicate that both families evolved under strong positive selection, suggesting that they are involved in the adaptation of Bd to its hosts. Conclusions We propose that the acquisition of virulence effectors, in combination with habitat disruption and climate change, may have driven the Bd epidemics and the decline in amphibian populations. This finding provides a starting point for biochemical investigations of chytridiomycosis

ResearchOnline@JCU

ResearchOnline at James Cook University

The University of North Carolina at Greensboro

University of Melbourne Institutional Repository

ScholarShip

Molecular Evolution of Aminoacyl tRNA Synthetase Proteins in the Early History of Life

Author: A Suzuki
Cheryl P. Andam
CP Andam
CP Andam
CP Andam
CR Woese
D Korencic
EB Newman
Eric J. Alm
G Srinivasan
GM Nagel
GP Fournier
Gregory P. Fournier
H Grosjean
HS Kim
J Dutheil
J. Peter Gogarten
JA Krzycki
JD Fischer
JF Xiao
JP Gogarten
M Ibba
M Wirtz
O Zhaxybayeva
P Schimmel
RC Edgar
RD Knight
RH White
S Bilokapic
S Cusack
S Fukai
S Guindon
SA Benner
T Suzuki
V Hanson-Smith
Y Ikeuchi
YI Wolf
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2011
Field of study

Aminoacyl-tRNA synthetases (aaRS) consist of several families of functionally conserved proteins essential for translation and protein synthesis. Like nearly all components of the translation machinery, most aaRS families are universally distributed across cellular life, being inherited from the time of the Last Universal Common Ancestor (LUCA). However, unlike the rest of the translation machinery, aaRS have undergone numerous ancient horizontal gene transfers, with several independent events detected between domains, and some possibly involving lineages diverging before the time of LUCA. These transfers reveal the complexity of molecular evolution at this early time, and the chimeric nature of genomes within cells that gave rise to the major domains. Additionally, given the role of these protein families in defining the amino acids used for protein synthesis, sequence reconstruction of their pre-LUCA ancestors can reveal the evolutionary processes at work in the origin of the genetic code. In particular, sequence reconstructions of the paralog ancestors of isoleucyl- and valyl- RS provide strong empirical evidence that at least for this divergence, the genetic code did not co-evolve with the aaRSs; rather, both amino acids were already part of the genetic code before their cognate aaRSs diverged from their common ancestor. The implications of this observation for the early evolution of RNA-directed protein biosynthesis are discussed.National Science Foundation (U.S.) (Grant DEB 0830024)National Science Foundation (U.S.) (Grant DEB 0936234)United States. National Aeronautics and Space Administration (NASA Postdoctoral Fellowship

DSpace@MIT

Bootstrap, Bayesian probability and maximum likelihood mapping: exploring new tools for comparative genome analyses

Author: A Drummond
AH Schinkel
B Rannala
B Snel
C Brochier
C Brochier
CR Woese
CR Woese
CR Woese
DE Graham
DH Huson
DR Walker
DT Jones
E Denamur
EV Koonin
EV Koonin
F Tekaia
J Felsenstein
J Felsenstein
J Lin
J Xiong
JD Thompson
JG Lawrence
JP Gogarten
JP Gogarten
JP Huelsenbeck
K Strimmer
K Strimmer
KG Karol
KS Makarova
L Olendzenski
MG Montague
MR Goddard
N Cermakian
RF Doolittle
RL Tatusov
RL Tatusov
RL Tatusov
RS Gupta
RS Gupta
RS Gupta
S Ribeiro
S Rousvoal
SF Altschul
SF Altschul
ST Fitz-Gibbon
T Sicheritz-Ponten
W Hennig
W Ludwig
WF Doolittle
WJ Murphy
WR Pearson
Y Hasegawa
YI Wolf
YI Wolf
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: Horizontal gene transfer (HGT) played an important role in shaping microbial genomes. In addition to genes under sporadic selection, HGT also affects housekeeping genes and those involved in information processing, even ribosomal RNA encoding genes. Here we describe tools that provide an assessment and graphic illustration of the mosaic nature of microbial genomes. RESULTS: We adapted the Maximum Likelihood (ML) mapping to the analyses of all detected quartets of orthologous genes found in four genomes. We have automated the assembly and analyses of these quartets of orthologs given the selection of four genomes. We compared the ML-mapping approach to more rigorous Bayesian probability and Bootstrap mapping techniques. The latter two approaches appear to be more conservative than the ML-mapping approach, but qualitatively all three approaches give equivalent results. All three tools were tested on mitochondrial genomes, which presumably were inherited as a single linkage group. CONCLUSIONS: In some instances of interphylum relationships we find nearly equal numbers of quartets strongly supporting the three possible topologies. In contrast, our analyses of genome quartets containing the cyanobacterium Synechocystis sp. indicate that a large part of the cyanobacterial genome is related to that of low GC Gram positives. Other groups that had been suggested as sister groups to the cyanobacteria contain many fewer genes that group with the Synechocystis orthologs. Interdomain comparisons of genome quartets containing the archaeon Halobacterium sp. revealed that Halobacterium sp. shares more genes with Bacteria that live in the same environment than with Bacteria that are more closely related based on rRNA phylogeny . Many of these genes encode proteins involved in substrate transport and metabolism and in information storage and processing. The performed analyses demonstrate that relationships among prokaryotes cannot be accurately depicted by or inferred from the tree-like evolution of a core of rarely transferred genes; rather prokaryotic genomes are mosaics in which different parts have different evolutionary histories. Probability mapping is a valuable tool to explore the mosaic nature of genomes

OrgConv: detection of gene conversion using consensus sequences and its application in plant mitochondrial and chloroplast homologs

Author: A Rambaut
AC Springman
C Fraser
C Wiuf
D Martin
D Posada
D Posada
D Posada
G Drouin
G McGuire
J Majewski
J Maynard Smith
J Zhou
JM Archibald
JM Smith
JP Gogarten
JP Jaramillo-Correa
JP Mower
KH Wolfe
M Stratz
M Vulic
MJ Moore
MO Salminen
P Lopez
PR Marri
RK Jansen
RT Papke
S Sawyer
SR Miller
T Kubo
T Lefebure
TC Bruen
TJ Barkman
U Bergthorsson
VV Goremykin
W Hao
Weilong Hao
Y Inagaki
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The ancestry of mitochondria and chloroplasts traces back to separate endosymbioses of once free-living bacteria. The highly reduced genomes of these two organelles therefore contain very distant homologs that only recently have been shown to recombine inside the mitochondrial genome. Detection of gene conversion between mitochondrial and chloroplast homologs was previously impossible due to the lack of suitable computer programs. Recently, I developed a novel method and have, for the first time, discovered recurrent gene conversion between chloroplast mitochondrial genes. The method will further our understanding of plant organellar genome evolution and help identify and remove gene regions with incongruent phylogenetic signals for several genes widely used in plant systematics. Here, I implement such a method that is available in a user friendly web interface. Results <monospace>OrgConv</monospace> (Organellar Conversion) is a computer package developed for detection of gene conversion between mitochondrial and chloroplast homologous genes. <monospace>OrgConv</monospace> is available in two forms; source code can be installed and run on a Linux platform and a web interface is available on multiple operating systems. The input files of the feature program are two multiple sequence alignments from different organellar compartments in FASTA format. The program compares every examined sequence against the consensus sequence of each sequence alignment rather than exhaustively examining every possible combination. Making use of consensus sequences significantly reduces the number of comparisons and therefore reduces overall computational time, which allows for analysis of very large datasets. Most importantly, with the significantly reduced number of comparisons, the statistical power remains high in the face of correction for multiple tests. Conclusions Both the source code and the web interface of <monospace>OrgConv</monospace> are available for free from the <monospace>OrgConv</monospace> website <url>http://www.indiana.edu/~orgconv</url>. Although <monospace>OrgConv</monospace> has been developed with main focus on detection of gene conversion between mitochondrial and chloroplast genes, it may also be used for detection of gene conversion between any two distinct groups of homologous sequences.</p

In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles

Author: C Médigue
C Perez-Iratxeta
CM Fraser
DM Raskin
EA Adie
EA Adie
EC Lin
EM Marcotte
Enrico Coiera
Frank PY Lin
FS Turner
G Michal
IH Witten
J Freudenberg
J Wu
JP Gogarten
JP Vert
KJ Gaulton
M Kanehisa
M Pellegrini
MY Galperin
N López-Bigas
N Tiffin
PD Karp
R Jothi
Ruiting Lan
S Aerts
Vitali Sintchenko
WJ Kent
Y Yamanishi
Y Zheng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: In silico candidate gene prioritisation (CGP) aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. Results: Using gene occurrence patterns in sample genomes, we developed two CGP methods (statistical and inductive CGP) to assist with the discovery of bacterial gene functions. Statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify gene occurrence pattern across genomes. Three rediscovery experiments were designed to evaluate the CGP frameworks. The first experiment attempted to rediscover peptidoglycan genes with 417 published genome sequences. Both CGP methods achieved best areas under receiver operating characteristic curve (AUC) of 0.911 in Escherichia coli K-12 (EC-K12) and 0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group of genome examples in the rediscovery of the peptidoglycan metabolism genes. In the second experiment, a maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes in EC-K12. The last experiment attempted to rediscover genes from 31 metabolic pathways in SA-2603, where 14 pathways achieved AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. Conclusion: Our results demonstrate that the two CGP methods can assist with the study of functionally uncategorised genomic regions and discovery of bacterial gene-function relationships. Our rediscovery experiments also provide a set of standard tasks against which future methods may be compared.12 page(s

Macquarie University ResearchOnline

UNSWorks

Conservation of intron and intein insertion sites: implications for life histories of parasitic genetic elements

Abstract Background Inteins and introns are genetic elements that are removed from proteins and RNA after translation or transcription, respectively. Previous studies have suggested that these genetic elements are found in conserved parts of the host protein. To our knowledge this type of analysis has not been done for group II introns residing within a gene. Here we provide quantitative statistical support from an analyses of proteins that host inteins, group I introns, group II introns and spliceosomal introns across all three domains of life. Results To determine whether or not inteins, group I, group II, and spliceosomal introns are found preferentially in conserved regions of their respective host protein, conservation profiles were generated and intein and intron positions were mapped to the profiles. Fisher's combined probability test was used to determine the significance of the distribution of insertion sites across the conservation profile for each protein. For a subset of studied proteins, the conservation profile and insertion positions were mapped to protein structures to determine if the insertion sites correlate to regions of functional activity. All inteins and most group I introns were found to be preferentially located within conserved regions; in contrast, a bacterial intein-like protein, group II and spliceosomal introns did not show a preference for conserved sites. Conclusions These findings demonstrate that inteins and group I introns are found preferentially in conserved regions of their respective host proteins. Homing endonucleases are often located within inteins and group I introns and these may facilitate mobility to conserved regions. Insertion at these conserved positions decreases the chance of elimination, and slows deletion of the elements, since removal of the elements has to be precise as not to disrupt the function of the protein. Furthermore, functional constrains on the targeted site make it more difficult for hosts to evolve immunity to the homing endonuclease. Therefore, these elements will better survive and propagate as molecular parasites in conserved sites. In contrast, spliceosomal introns and group II introns do not show significant preference for conserved sites and appear to have adopted a different strategy to evade loss.</p

Public Library of Science (PLOS)

Phylogenomic Analysis of Marine Roseobacters

Author: A Buchan
A Stamatakis
C Dutta
Carl Kingsford
Cathy H. Wu
CH Wu
CJ Creevey
CM Thomas
D Posada
DF Robinso
E Bapteste
E Lerat
E Susko
F Abascal
G Bouxin
G Talavera
GT Taylor
H Ochman
H Shimodaira
H Shimodaira
HA Schmidt
Hongzhan Huang
I Wagner-Dobler
I Wagner-Dobler
J Bergsten
J Castresana
J Felsenstein
JA Eisen
JD Thompson
JP Gogarten
JP Gogarten
JP Huelsenbeck
JR Brown
Kai Tang
KH Tang
L Li
LM Schouls
MA Moran
MS Poptsova
N Galtier
Nianzhi Jiao
NZ Jiao
O Zhaxybayeva
O Zhaxybayeva
R Jain
R Seshadri
RD Page
RG Beiko
RG Beiko
RL Charlebois
RL Tatusov
RS Poretsky
S Guindon
SF Altschul
SJ Sorensen
SM Sowell
T Brinkhoff
T Shi
TR Miller
V Daubin
VM Markowitz
Y Zhang
Y Zhao
ZS Kolber
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10–25 % of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. Methodology/Principal Findings: In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira–Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Conclusions/Significance: Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes fo

CiteSeerX